Exploiting Word Transformation in Statistical Machine Translation from Spanish to English
نویسنده
چکیده
This paper investigates the use of morphosyntactic information to reduce datasparseness in statistical machine translation from Spanish to English. In particular, word-alignment training is performed by applying different word transformations using lemmas and stems. It has been observed that stem-based training is better than lemma-based training when up to 1 million running words of data are used. In this paper a new word-alignment training technique is proposed by exploiting syntactically motivated constraints to the parallel data. Preliminary experimental results show that stem-based training with syntactically motivated constraints gives significant improvement in translation performance. Finally, a technique to reduce the impact of out-of-vocabulary words is discussed. The considered task is the translation of Plenary Sessions of the European Parliament.
منابع مشابه
Grouping Multi-Word Expressions According To Part-Of-Speech In Statistical Machine Translation
This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using real-life data, namely the European Parliament corpus. Results f...
متن کاملThe TCH machine translation system for IWSLT 2008
This paper reports on the first participation of TCH (Toshiba (China) Research and Development Center) at the IWSLT evaluation campaign. We participated in all the 5 translation tasks with Chinese as source language or target language. For Chinese-English and English-Chinese translation, we used hybrid systems that combine rule-based machine translation (RBMT) method and statistical machine tra...
متن کاملData Inferred Multi-word Expressions for Statistical Machine Translation
This paper presents a strategy for detecting and using multi-word expressions in Statistical Machine Translation. Performance of the proposed strategy is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using the Verbmobil corpus. Results from translation tasks from English-toSpanish and from Spanish-to-English are presented and discussed.
متن کاملImproving Word Alignment with Language Model Based Confidence Scores
This paper describes the statistical machine translation systems submitted to the ACL-WMT 2008 shared translation task. Systems were submitted for two translation directions: English→Spanish and Spanish→English. Using sentence pair confidence scores estimated with source and target language models, improvements are observed on the NewsCommentary test sets. Genre-dependent sentence pair confiden...
متن کاملTowards the Use of Word Stems and Suffixes for Statistical Machine Translation
In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006